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Abstract 

The  conventional  approach  to  routing  in  computer  networks  con¬ 
sists  of  using  a  heuristic  to  compute  a  single  shortest  path  from 
a  source  to  a  destination.  Single-path  routing  is  very  responsive 
to  topological  and  link-cost  changes;  however,  except  under  light 
traffic  loads,  the  delays  obtained  with  this  type  of  routing  are  far 
from  optimal.  Furthermore,  if  link  costs  are  associated  with  de¬ 
lays,  single-path  routing  exhibits  oscillatory  behavior  and  becomes 
unstable  as  traffic  loads  increase.  On  the  other  hand,  minimum- 
delay  routing  approaches  can  minimize  delays  only  when  traffic  is 
stationary  or  very  slowly  changing. 

We  present  a  “near-optimal”  routing  framework  that  offers  de¬ 
lays  comparable  to  those  of  optimal  routing  and  that  is  as  flexible 
and  responsive  as  single-path  routing  protocols  proposed  to  date. 
First,  an  approximation  to  the  Gallager’s  minimum-delay  routing 
problem  is  derived,  and  then  algorithms  that  implement  the  ap¬ 
proximation  scheme  are  presented  and  verified.  We  introduce  the 
first  routing  algorithm  based  on  link-state  information  that  provides 
multiple  paths  of  unequal  cost  to  each  destination  that  are  loop-free 
at  every  instant.  We  show  through  simulations  that  the  delays  ob¬ 
tained  in  our  framework  are  comparable  to  those  obtained  using  the 
Gallager’s  minimum-delay  routing.  Also,  we  show  that  our  frame¬ 
work  renders  far  smaller  delays  and  makes  better  use  of  resources 
than  traditional  single-path  routing. 

1  Introduction 

The  standard  approach  to  routing  in  computer  networks  today  con¬ 
sists  of  computing  a  single  shortest  path  from  a  source  to  each  des¬ 
tination  using  some  heuristic  link-cost  metric,  which  is  typically 
not  directly  associated  with  the  transmission  and  queueing  delays 
over  links  and  paths.  A  less  common  approach  to  routing  is  that 
of  defining  the  routing  problem  as  an  optimization  problem  (e.g., 
multicommodity  problem  [5])  with  a  specific  objective  function, 
such  as  minimizing  delays  or  maximizing  throughput,  and  solving 
the  problem  using  any  of  several  known  optimization  techniques. 
These  two  traditional  approaches  to  routing  have  inherent  strengths 
and  drawbacks. 

In  order  to  provide  minimum  delays,  all  optimal  routing  algo¬ 
rithms  require  the  input  traffic  and  the  network  topology  to  be  sta¬ 
tionary  or  very  slowly  changing  (quasi-static),  and  require  a  pri¬ 
ori  knowledge  of  global  constants  that  guarantee  convergence  of 
the  routing  algorithm.  This  makes  optimal  routing  algorithms  im- 
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practical  for  real  networks,  because  in  real  networks  traffic  is  very 
bursty  at  any  time  scale  and  the  network  topology  frequently  expe¬ 
rience  changes.  Moreover,  defining  global  constants  that  work  for 
all  input  traffic  patterns  are  impossible  to  determine. 

On  the  other  hand,  routing  algorithms  based  on  single  shortest- 
path  heuristics  adapt  very  quickly  to  changing  network  conditions, 
making  them  far  more  preferable  than  optimal  routing  for  imple¬ 
mentation  in  real  networks.  The  main  shortcoming  of  single  shortest- 
path  routing  is  that  the  delays  achievable  with  such  heuristics  are 
far  longer  than  those  achievable  using  optimal  routing  algorithms. 
In  addition,  single-shortest-path  routing  becomes  unstable  under 
heavy  loads  or  very  bursty  traffic  when  the  link  cost  metric  used  in 
the  routing  algorithm  is  related  to  the  delays  or  congestion  experi¬ 
enced  over  the  links  [3], 

The  fact  that  shortest-path  routing  over  single  paths  is  far  less 
efficient  than  optimal  dynamic  routing  and  the  oscillatory  behavior 
of  shortest-path  routing  when  link  costs  are  tied  to  link  delays  has 
been  known  for  many  years.  However,  implementing  optimal  dy¬ 
namic  routing  in  a  computer  network  has  simply  been  infeasible  to 
date.  The  key  contributions  of  this  paper  consist  of:  (a)  introducing 
a  new  framework  for  near-optimum  delay  routing;  (b)  verifying, 
for  the  first  time,  a  set  of  invariants  that  permit  routing-algorithm 
designers  to  approximate  Gallager’s  necessary  and  sufficient  condi¬ 
tions  for  minimum-delay  routing  with  loop-free  routing  conditions 
that  can  be  achieved  using  distributed  routing  algorithms  that  do 
not  require  any  global  variables  or  global  synchronization;  and  (c) 
showing  an  example  that  provides  end-to-end  delays  that  are  com¬ 
parable  to  the  optimal,  while  being  as  fast  as  today’s  shortest-path 
routing  schemes. 

Section  2  presents  the  minimum-delay  routing  problem  (MDRP) 
as  described  by  Gallager,  and  Gallager’s  minimum-delay  routing 
algorithm  [8],  Gallager’s  algorithm  is  unsuitable  for  practical  net¬ 
works  and  internetworks,  because  its  speed  of  convergence  to  the 
optimal  routes  depends  on  a  global  constant,  and  because  it  requires 
that  the  input  traffic  and  network  topology  be  stationary  or  quasi¬ 
stationary. 

Several  algorithms  have  been  proposed  to  date  that  improve 
over  Gallager’s  minimum-delay  routing  algorithm  [2,  6,  23,  24]. 
Segall  and  Sidi  [23,  24]  extended  Gallager’s  minimum-delay  rout¬ 
ing  algorithm  to  handle  topological  changes  using  techniques  de¬ 
veloped  by  Merlin  and  Segall  [19].  Cassandras  et  al.  [6]  present 
a  better  technique  for  measuring  marginal  delays.  Bertsekas  and 
Gallager  [2]  used  second  derivatives  to  speed  up  convergence  of 
Gallager's  algorithm.  However,  all  these  algorithms  are  still  depen¬ 
dent  on  global  constants  and  the  requirement  that  network  traffic  be 
static  or  quasi-static. 

Because  of  its  oscillatory  behavior  when  link  costs  are  related 
to  delays,  attempts  to  improving  shortest-path  routing  have  been 
restricted  mainly  to  using  better  link  cost  metrics  (e.g.,  [18,  13]) 
or  using  multiple-paths.  To  avoid  undetected  loops,  OSPF  per¬ 
mits  multiple  paths  to  a  destination  only  when  they  have  the  same 
length  [20] .  More  recently,  Zaumen  and  Garcia-Luna-Aceves  [27] 
proposed  an  algorithm  based  on  distance  vectors  that  supports  mul¬ 
tiple  paths  of  unequal  costs  to  each  destination;  however,  link  costs 
are  not  tied  to  delays.  Wang  and  Crowcroft  [26]  addressed  the 
drawbacks  of  the  shortest-path  first  (SPF)  algorithm  by  using  alter¬ 
nate  paths  to  detour  traffic  around  points  of  congestion  or  network 
failures.  However,  the  alternate  paths  in  SPF-EE  (for  emergency 
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exits)  are  computed  on  a  reactive  basis,  i.e.,  once  congestion  oc¬ 
curs,  which  is  less  effective  in  dealing  with  short  bursts  of  traffic. 

Cain  et  al.  [4]  describe  a  routing  algorithm  for  minimizing  de¬ 
lays.  However,  this  algorithm  requires  that  the  routing-table  up¬ 
dates  at  all  the  routers  be  synchronized,  otherwise  looping  occurs, 
which  increases  end-to-end  delays.  Because  the  synchronization 
intervals  required  by  this  algorithm  must  be  known  by  all  routers, 
this  is  akin  to  using  a  global  constant  as  in  Gallager’s  algorithm. 
This  approach  is  not  scalable  to  very  large  networks,  because  the 
time  needed  for  routing-table  update  synchronization  becomes  large, 
and  this  in  turn  limits  its  responsiveness  to  short-term  traffic  fluc¬ 
tuations.  What  is  seriously  lacking  in  this  algorithm  is  a  technique 
for  asynchronous  computation  of  multiple  paths  with  instantaneous 
loop-freedom. 

Section  3  presents  a  new  framework  for  approximate  solutions 
to  MDRP.  The  novelty  of  this  framework  stems  from  partitioning 
the  computation  of  minimum-delay  paths  in  two  parts.  First,  mul¬ 
tiple  loop-free  paths  of  unequal  cost  to  a  destination  are  first  es¬ 
tablished  using  long-term  link-cost  information.  This  is  followed 
by  the  allocation  of  flows  to  destinations  along  the  multiple  loop- 
free  paths  available  at  each  router;  such  an  allocation  is  based  on 
heuristics  that  attempt  to  minimize  delays  using  short-term  link- 
cost  information.  It  is  this  partitioning  of  MDRP  that  permits  us 
to  implement  routing  algorithms  that  provide  routers  with  near¬ 
optimum  delays  while  keeping  the  routing  algorithm  as  responsive 
to  traffic  or  topology  changes  as  the  best  of  today’s  shortest-path 
routing  algorithms.  A  set  of  invariants  is  also  presented  that  per¬ 
mits  Gallager’s  necessary  and  sufficient  conditions  for  minimum- 
delay  routing  to  be  approximated  with  loop-free  routing  conditions 
achievable  with  simple  distributed  routing  algorithms  that  do  not 
require  any  global  variables  or  global  synchronization. 

Section  4  describes  a  specific  routing  algorithm  based  on  our 
new  routing  framework.  This  algorithm  consists  of  two  key  compo¬ 
nents:  (a)  the  first  link-state  routing  algorithm  that  provides  multi¬ 
ple  loop-free  paths  of  arbitrary  positive  cost  at  every  instant,  and  (b) 
flow  allocation  heuristics  that  approximate  minimum  delays  along 
the  predefined  multiple  loop-free  paths  available  for  each  destina¬ 
tion. 

Section  5  presents  results  of  simulation  experiments  designed 
to  illustrate  the  effectiveness  of  our  solution  in  static  and  dynamic 
networks.  We  compare  our  approach  against  the  optimal  routing 
approach  and  shortest-path  routing  based  on  Dijkstra’s  shortest- 
path  first  (SPF)  algorithm,  because  it  is  used  widely  in  the  Internet 
today.  The  simulation  results  illustrate  that  the  routing  delays  ob¬ 
tained  with  our  new  algorithm  are  comparable  to  the  optimal  de¬ 
lays.  Furthermore,  the  complexity  of  implementing  our  routing 
framework  is  similar  to  the  complexity  of  routing  protocols  that 
provide  single-path  routing  in  the  Internet  today. 


Let  f  ik  be  the  expected  traffic,  measured  in  bits  per  second,  on 
link  (i,k).  Because  f'  fijk  is  the  traffic  destined  for  router  j  on  link 
( i ,  k )  we  have  the  following  equation  to  find  . 

fik  =  y  t)<j>)k  (2) 

jeN 

Note  that  0  <  /,/,  <  CJ,k,  where  Cik  is  the  capacity  of  link  (j,  k) 
in  bits  per  second. 

Property  1  For  each  router  i  and  destination  j,  the  routing  pa¬ 
rameters  fiffi  must  satisfy  the  following  conditions: 

1.  fiffi  =0if(i,k)  (f  L  or  i  =  j.  Clearly,  if  the  link  does  not 
exist,  there  can  be  no  traffic  on  it. 

2-  (lfk  >  0.  This  is  true,  because  there  can  be  no  negative 
amount  of  traffic  allocated  on  a  link. 

3.  T.,.c4i  fiffi  =  1.  This  is  a  consequence  of  the  fact  that  all 
incoming  traffic  must  be  allocated  to  outgoing  links. 

Let  Djk  be  defined  as  the  expected  number  of  messages  or 
packets  per  second  transmitted  on  link  (i,  k)  times  the  expected 
delay  per  message  or  packet,  including  the  queueing  delays  at  the 
link.  We  assume  that  messages  are  delayed  only  by  the  links  of 
the  network  and  D,-/,  depends  only  on  flow  f,k  through  link  (i,  k) 
and  link  characteristics  such  as  propagation  delay  and  link  capacity. 
D,i,  (fik)  is  a  continuous  and  convex  function  that  tends  to  infinity 
as  f  ik  approaches  Cjk.  The  total  expected  delay  per  message  times 
the  total  expected  number  of  message  arrivals  per  second  is  given 
by 

Dt=  Y,  Dik(fik)  (3) 

( i,k)CL 

Note  that  the  router  traffic-flow  set  f  =  {f '  }  and  link-flow  set 
f  =  }  can  be  obtained  from  r  =  { r’  }  and  fi  =  {fiffi}.  There¬ 

fore,  Dt  can  be  expressed  as  a  function  of  r  and  fi  using  Eqs.  (1) 
and  (2).  The  minimum-delay  routing  problem  can  now  be  stated  as 
follows: 


2  Minimum  Delay  Routing 
2.1  Problem  formulation 

The  minimum-delay  routing  problem  (MDRP)  was  first  formulated 
by  Gallager  [8],  and  we  provide  the  same  description  in  this  sec¬ 
tion.  A  computer  network  G  =  ( N,L )  is  made  up  of  N  routers 
and  L  links  between  them.  Each  link  is  bidirectional  with  possibly 
different  costs  in  each  direction. 

Let  r'j  >  0  be  the  expected  input  traffic,  measured  in  bits  per 
second,  entering  the  network  at  router  i  and  destined  for  router  j. 
Let  If  be  the  sum  of  r'j  and  the  traffic  arriving  from  the  neighbors 
of  i  for  destination  j.  And  let  routing  parameter  (jf  k  be  the  fraction 
of  traffic  tj  that  leaves  router  i  over  link  (i,  k).  Assuming  that  the 
network  does  not  lose  any  packets,  from  conservation  of  traffic  we 
have 


ii  i  ,  \  ^  ,k  ±k 

tj  —  Vj  +  y  ^  tj  cpj  i 

k£N{ 

where  N'  is  the  set  of  neighbors  of  router  i. 


(1) 


MDRP:  For  a  given  fixed  topology  and  input  traffic  flow  set  r  = 
{r]},  and  delay  function  Djk(fik)  for  each  link  (i,  k),  the  mini¬ 
mization  problem  consists  of  computing  the  routing  parameter  set 
fi  =  { 0‘  k  }  such  that  the  total  expected  delay  Dt  is  minimized. 


2.2  A  Minimum  Delay  Routing  Algorithm 

Gallager  [8]  derived  the  necessary  and  sufficient  conditions  that 
must  be  satisfied  to  solve  MDRR  These  conditions  are  summarized 
in  Gallager’s  Theorem  stated  below. 

The  partial  derivatives  of  the  total  delay,  Dt,  of  Eq.(3)  with 
respect  to  r  and  fi  play  a  key  role  in  the  formulation  and  solution 
of  the  problem;  these  derivatives  are: 


ODt 

dDy 
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drk 
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(4) 

(5) 


where  D'ik(fik)  =  dD,k(f ,k) / d fik.  and  is  called  the  marginal 
delay  or  incremental  delay. 


Similarly,  dDr  /dr]  is  called  the  marginal  distance  from  router 
i  to  j. 

Gallager’s  Theorem  [8]:  The  necessary  condition  for  a  minimum 
of  Dt  with  respect  to  <j>for  all  i  j  and  (i,  k)  £  L  is 


dPT  =  J  =  <t>)k  >  0 

94>)k  \  >  A-./  <t>)k  =  0 


(6) 


where  A,j  is  some  positive  number,  and  the  sufficient  condition  to 
minimize  Dt  with  respect  to  <j>  is  for  all  i  j  and  (i,  k)  €  L  is 


D'ik(fu 


dPT  >  dDT 


drj 


dr'. 


(7) 


Eq.  (4)  shows  the  relation  between  a  router’s  marginal  distance 
to  a  particular  destination  and  the  marginal  distances  of  its  neigh¬ 
bors  to  the  same  destination.  Eqs.  (5)-(7)  indicate  the  conditions 
for  perfect  load  balancing,  i.e.,  when  the  routing  parameter  set  f 
gives  the  minimum  delay. 

The  set  of  neighbors  through  which  router  i  forwards  traffic 
towards  j  is  denoted  by  S]  and  is  called  the  successor  set.1 

Under  perfect  load  balancing  with  respect  to  a  particular  desti¬ 
nation,  the  marginal  distances  through  neighbors  in  the  successor 
set  are  equal  to  the  marginal  distance  of  the  router,  and  the  marginal 
distances  through  neighbors  not  in  the  successor  set  are  higher  than 
the  marginal  distance  of  the  router. 

Let  Dj  denote  the  marginal  distance  from  i  to  j,  i.e.,  ODt /dr] . 
Let  the  marginal  delay  D'ik(fjk )  from  i  to  k  be  denoted  by  lk  which 
is  also  called  the  cost  of  the  link  from  i  to  k. 

According  to  Gallager’s  Theorem,  the  minimum  delay  routing 
problem  now  becomes  one  of  determining,  at  each  router  i  for  each 
destination  j:  the  routing  parameters  {4>]k},  S]  and  D],  such  that 
the  following  five  equations  are  satisfied: 


Dij 

= 

j  +4) 

(8) 

k£Ni 

=  {k\4>)h  >  OAk  e  Ari} 

(9) 

d) 

<  Dl+l\ 

k  e 
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+ ip) 

=  (D]+Tq) 

P,Q  esj 

(11) 

+ ip) 

<  (Dj  +  Iq) 

P  e  S)  q  £  S] 

(12) 

simply  implied  by  the  routing  parameters  in  Eq.  (9).  The  computa¬ 
tion  of  routing  parameters  is,  for  all  practical  purposes,  a  very  slow 
process  as  it  is  a  destination-controlled  process.  The  destination 
initiates  every  iteration  that  adjusts  the  routing  parameters  at  every 
router;  furthermore,  each  iteration  takes  a  time  proportional  to  the 
diameter  of  the  network  and  number  of  messages  proportional  to 
number  of  links.  This  renders  the  algorithm  slow  converging  and 
useful  only  when  traffic  and  topology  are  stationary  for  times  long 
enough  for  all  routers  to  adjust  their  routing  parameters  between 
changes.  Also,  depending  on  the  global  constant  p,  the  destina¬ 
tion  must  initiate  several  iterations  for  the  parameters  to  converge 
to  their  final  values.  The  number  of  such  iterations  needed  for  con¬ 
vergence  tends  to  be  large  for  a  small  p,  and  small  for  a  large  value 
of  p.  Unfortunately,  p  cannot  be  made  arbitrarily  large  to  reduce 
the  number  of  iterations  and  to  speed  up  convergence,  because  the 
algorithm  may  not  converge  at  all  for  large  values  of  p. 

Hence,  Gallager’s  algorithm  can  be  viewed  only  as  a  method 
for  obtaining  lower  bounds  under  stationary  traffic,  rather  than  as 
an  algorithm  to  be  used  in  practice.  The  next  section  shows  how 
the  theory  introduced  in  the  Gallager’s  method  can  be  adapted  to 
practical  networks. 

3  A  New  Framework  for  Minimum- Delay  Routing 

We  noted  that  in  Gallager's  algorithm  the  computation  of  the  rout¬ 
ing  parameter  set  <j>  is  slow  converging  and  works  only  in  the  case  of 
stationary  or  quasi-stationary  traffic.  In  the  Internet,  traffic  is  hardly 
stationary  and  perfect  load  balancing  is  neither  possible  nor  neces¬ 
sary.  Intuitively,  an  approximate  load  balancing  scheme  based  on 
some  heuristic  which  can  quickly  adapt  to  dynamic  traffic  should 
be  sufficient  to  minimize  delays  substantially. 

The  key  idea  in  our  approach  is,  in  a  sense,  to  reverse  the  way 
in  which  Gallager’s  algorithm  solves  MDRP.  The  intuition  behind 
our  approach  is  that  establishing  paths  from  sources  to  destinations 
takes  a  much  longer  time  than  shifting  loads  from  one  set  of  neigh¬ 
bors  to  another,  simply  because  of  the  propagation  and  processing 
delays  incurred  along  the  paths.  Accordingly,  it  makes  sense  to  first 
establish  multiple  loop-free  paths  using  long-term  (end-to-end)  de¬ 
lay  information,  and  then  adjust  routing  parameters  along  the  pre¬ 
defined  multiple  paths  using  short-term  (local)  delay  information. 

This  new  approach  allows  us  to  attempt  to  use  distributed  algo¬ 
rithms  to  compute  multiple  loop-free  paths  from  source  to  destina¬ 
tion  that,  hopefully,  are  as  fast  as  today’s  single-path  routing  algo¬ 
rithms,  and  local  heuristics  that  can  respond  quickly  to  temporary 
traffic  bursts  using  local  short-term  metrics  alone.  Therefore,  we 
map  Eqs.  (8)-(  12)  derived  in  Gallager’s  method  into  the  following 
three  equations: 


This  reformulation  of  MDRP  is  critical,  because  it  is  the  first 
step  in  allowing  us  to  approach  the  problem  by  looking  at  the  next- 
hops  and  distances  obtained  at  each  router  for  each  destination. 
Gallager  [8]  described  a  distributed  routing  algorithm  for  solving 
the  above  five  equations.  When  the  algorithm  converges,  the  aggre¬ 
gate  of  the  successor  sets  for  a  given  destination  j  (S]  for  every  i ) 
define  a  directed  acyclic  graph.  In  fact,  in  any  implementation,  S] 
must  be  loop-free  at  every  instant,  because  even  temporary  loops 
cause  traffic  to  recirculate  at  some  nodes  and  results  in  incorrect 
marginal  delay  computations,  which  in  turn  can  prevent  the  algo¬ 
rithm  from  converging  or  obtaining  minimum  delays. 

Gallager’s  distributed  algorithm  uses  an  interesting  blocking 
technique  to  provide  loop-freedom  at  every  instant  [8,  23,  24],  We 
refer  to  this  algorithm  as  OPT  in  the  rest  of  the  paper.  Unfortu¬ 
nately,  OPT  cannot  be  used  in  real  networks  for  several  reasons. 
A  major  drawback  of  OPT  is  that  a  global  step  size  p  needs  to  be 
chosen  and  every  router  must  use  it  to  ensure  convergence.  Be¬ 
cause  p  depends  on  the  input  traffic  pattern,  it  is  impossible  to  de¬ 
termine  one  in  practice  that  works  for  all  input  traffic  patterns  and 
for  all  possible  topology  modifications.  The  routing  parameters 
are  directly  computed  by  OPT  and  the  multiple  loop-free  paths  are 

1  The  term  successor  set  was  first  introduced  in  [27]. 


D) 

=  min{Dj  +  lk\k  £  N’} 

(13) 

S) 

=  lk  1)/  <  1)  Ak(  V  } 

(14) 

=  ,1'.  HI)  k  £  V; 

(15) 

where  A)  =  (/) 

+  Vp\p  £  X  j  and  B]  =  p  £  N'}. 

These  equations  simply  state  that,  for  an  algorithm  to  approxi¬ 
mate  minimum-delay  routing,  it  must  establish  loop-free  paths  and 
use  a  function  4'  to  allocate  flows  over  those  paths.  We  observe  that 
Eq.  (13)  is  the  well-known  Bellman-Ford  (BF)  equation  for  com¬ 
puting  the  shortest  paths,  and  Eq.  (14)  is  the  successor  set  consist¬ 
ing  of  the  neighbors  that  are  closer  to  the  destination  than  the  router 
itself.  Note  that  the  paths  implied  by  the  neighbors  in  the  successor 
set  of  a  router  need  not  be  of  the  same  length.  The  function  T1  in 
Eq.  (15)  is  a  heuristic  function  that  determines  the  routing  parame¬ 
ters.  Because  changing  the  routing  parameters  effects  the  marginal 
delay  of  the  links  (hence  link-costs),  we  use  regular  updates  of  the 
link  costs. 

The  main  problem  with  attempting  to  solve  MDRP  using  Eqs. 
(13)  to  (15)  directly  is  that  these  equations  assume  that  routing  in¬ 
formation  is  consistent  throughout  the  network.  In  practice,  a  node 
(router)  must  choose  its  distance  and  successor  set  using  routing  in¬ 
formation  obtained  through  its  neighbors,  and  this  information  may 


be  outdated.  At  any  time  t ,  for  a  particular  destination  j,  the  succes¬ 
sor  sets  of  all  nodes  define  a  routing  graph  SGj(t)  =  {( m,n)\n  E 
Sf(t),  m  E  Ar}.  In  single-path  routing,  S)(t)  has  at  most  one 
neighbor:  the  neighbor  that  is  on  the  shortest  path  to  destination  j. 
Accordingly,  SGj  (f )  for  single-path  routing  is  a  sink-tree  rooted  at 
j  if  loops  are  never  created.  The  routing  graph  SGj(t)  in  our  case 
should  be  a  directed  acyclic  graph  in  order  for  minimum  delays  to 
be  approached. 

The  blocking  technique  used  in  Gallager’s  algorithm  ensures 
instantaneous  loop-freedom.  Likewise,  to  provide  loop-free  paths 
even  when  the  network  is  in  transient  state  within  the  context  of  our 
framework,  additional  constraints  must  be  imposed  on  the  choice  of 
successors  at  each  router,  which  essentially  must  preclude  the  use 
of  neighbors  that  may  lead  to  looping. 

Several  algorithms  have  been  proposed  in  the  past  to  provide 
loop-free  paths  at  every  instant  for  the  case  of  single-path  routing 
(e.g.,  the  Jaffe-Moss  algorithm  [15],  DUAL  [9],  LPA  [11],  and  the 
Merlin-Segall  algorithm  [19])  and  one  algorithm,  DASM,  has  been 
proposed  for  the  case  of  multiple  paths  per  destination  [27].  All 
these  algorithms  are  based  on  the  exchange  of  vectors  of  distances, 
together  with  some  form  of  coordination  among  routers  spanning 
one  or  multiple  hops.  The  coordination  among  routers  determines 
when  the  routers  can  update  their  routing  tables.  This  coordina¬ 
tion  is  in  turn  guided  by  local  conditions  that  depend  on  values  of 
reported  distances  to  destinations  and  that  are  sufficient  to  prevent 
loops  from  occurring. 

We  generalize  the  work  to  date  on  loop-free  routing  over  single 
paths  or  multiple  paths  by  means  of  the  following  loop-free  invari¬ 
ant  (LFI)  conditions,  which  are  applicable  to  any  type  of  routing 
algorithm.  We  adopt  the  same  terminology  and  nomenclature  first 
introduced  for  DUAL  [9]  to  describe  the  LFI  conditions. 

Loop-free  Invariant  (LFI)  conditions:  Any  routing  algorithm  de¬ 
signed  such  that  the  following  two  equations  are  always  satisfied, 
automatically  provides  loop-free  paths  at  every  instant,  regardless 
of  the  type  of  routing  algorithm  being  used: 

FD |  <  Dji  k  E  N'  (16) 

S)  =  {  k  |  D)k  <  FD)  A  k.E  A  }  (17) 

where  D)k  is  the  value  of  D)  reported  to  i  by  its  neighbor  k;  and 
FD )  is  called  the  feasible  distance  of  router  i  for  destination  j  and 
is  an  estimate  of  D),  in  the  sense  that  FD)  equals  D)  in  steady 
state  but  is  allowed  to  differ  from  it  temporarily  during  periods  of 
network  transitions. 

In  link-state  algorithms,  the  values  of  D)k  are  determined  lo¬ 
cally  from  the  link-state  information  supplied  by  the  router's  neigh¬ 
bors;  in  contrast,  in  distance-vector  algorithms,  the  distances  are 
directly  communicated  among  neighbors.  The  following  theorem 
verifies  this  key  result  of  our  framework. 

Theorem  1  If  the  LFI  conditions  are  satisfied  at  any  time  t,  the 
routing  graph  SGj  (t)  implied  by  the  successor  sets  S)  (f )  is  loop- 
free. 

Proof:  Let  k  E  S)  (t)  then  from  Eq.  (17)  we  have 

D)k(t)  <  FD)  (f)  (18) 

At  router  k,  because  router  i  is  a  neighbor,  from  Eq.  (16)  we 
have  FDj(t.)  <  D)k{t).  Combining  this  result  with  Eq.  (18)  we 
obtain 

FDj  (f)  <  FD){t)  (19) 

Eq.  (19)  states  that,  if  k  is  a  successor  of  router  j  in  a  path  to 
destination  j,  then  k’s  feasible  distance  to  j  is  strictly  less  than  the 
feasible  distance  of  router  i  to  j.  Now,  if  the  successor  sets  define  a 


loop  at  time  t,  with  respect  to  j,  then  for  some  router  p  on  the  loop, 
we  arrive  at  FDp.(t.)  <  FDp(t),  an  absurd  relation.  Therefore,  the 
LFI  conditions  are  sufficient  for  loop-freedom.  □ 

With  the  result  of  Theorem  1,  Eq.  (14)  can  be  approximated 
with  the  LFI  conditions  to  render  a  routing  approach  that  does  not 
require  routing  information  to  be  globally  consistent,  at  the  expense 
of  rendering  delays  that  may  be  longer  than  optimal.  Accordingly, 
our  framework  for  near-optimum-delay  routing  lies  in  finding  the 
solution  to  the  following  equations  using  a  distributed  algorithm: 


Dij 

=  min{Dj  +  l‘k\k  E  IV*} 

(20) 

FD) 

<  If;  k  E  N* 

(21) 

S) 

=  {k  !)):.  <  I'D)  \  k  •  V*  1 

(22) 

<4 

\p  E  IV*})  k  E  (23) 

4  Implementing  Near-Optimum-Delay  Routing 

We  present  an  approach  based  on  link-state  information,  rather  than 
distance  information,  because  extending  our  results  to  minimum- 
delay  routing  with  additional  constraints  can  be  done  more  effi¬ 
ciently  by  working  with  link  parameters  than  path  parameters,  which 
are  the  combination  of  link  parameters.  Our  approach  consists  of 
three  components:  computing  multiple  loop-free  paths,  distributing 
traffic  over  such  paths,  and  computing  link  costs. 

4. 1  Computing  Multiple  Loop-free  Paths 

We  describe  the  computation  of  multiple  loop-free  paths  in  two 
parts:  computing  D)  using  a  shortest-path  algorithm  based  on  link- 
state  information,  and  computing  S)  by  extending  that  algorithm  to 
support  multiple  successors  along  loop-free  paths  to  each  destina¬ 
tion. 

4.1.1  Computing  D) 

There  are  many  distributed  algorithms  for  computing  shortest  paths, 
and  any  of  them  can  be  extended  to  provide  multiple  paths  of  equal 
and  unequal  costs  as  long  as  the  extension  obeys  the  LFI  conditions 
introduced  in  the  previous  section. 

The  partial-topology  dissemination  algorithm  (PDA)  propagates 
enough  link-state  information  in  the  network,  so  that  each  router 
has  sufficient  link-state  information  to  compute  shortest  paths  to  all 
destinations.  In  this  respect,  it  is  similar  to  other  link-state  algo¬ 
rithms  (e.g.,  OSPF  [20],  SPTA  [25],  LVA  [10],  ALP  [12]).  PDA 
combines  the  best  features  of  LVA,  ALP  and  SPTA.  As  in  LVA  and 
ALP,  a  router  communicates  to  its  neighbors  information  regarding 
only  those  links  that  are  part  of  its  minimum-cost  routing  tree,  and 
like  SPTA,  a  router  validates  link  information  based  on  distances  to 
heads  of  links  and  not  on  sequence  numbers. 

PDA  assumes  that  a  router  detects  the  failure,  recovery  and 
link-cost  change  of  an  adjacent  link  within  a  finite  amount  of  time. 
An  underlying  protocol  ensures  that  messages  transmitted  over  an 
operational  link  are  received  correctly  and  in  the  proper  sequence 
within  a  finite  time  and  are  processed  by  the  router  one  at  a  time  in 
the  order  received.  These  are  the  same  assumptions  made  for  simi¬ 
lar  routing  algorithms  and  can  be  easily  satisfied  in  practice.  Each 
router  i  running  PDA  maintains  the  following  information: 

1.  The  main  topology  table,  T\  stores  the  characteristics  of 
each  link  known  to  router  i.  Each  entry  in  T’  is  a  triplet 
[, h ,  t ,  d]  where  h  is  the  head,  f  is  the  tail  and  d  is  the  cost  of 
the  link  h  — >■  t. 

2.  The  neighbor  topology  table,  T’k,  is  associated  with  each 
neighbor  k.  The  table  stores  the  link-state  information  com¬ 
municated  by  the  neighbor  k.  That  is,  T)  is  a  time-delayed 
version  of  Tk . 


procedure  INIT-PDA 

{Invoked  when  the  router  comes  up. } 

begin 

Initialize  all  tables; 
call  PDA; 
end  INIT-PDA 

procedure  PDA 

{Executed  at  each  router  i.  Invoked  when  an  event  occurs} 

begin 

(1)  call  NTU; 

(2)  call  MTU;  /*  Updates  T,:  */ 

(3 )  if  (there  are  changes  to  Tl)  then 

Compose  an  LSU  message  consisting  of  topology 
differences  using  add.  delete 
and  change  link  entries; 
endif 

(4)  Within  a  finite  amount  time,  send  the 

LSU  message  to  all  neighbors; 
end  PDA 


Figure  1:  The  Partial-topology  Dissemination  Algorithm 


3.  The  distance  table  stores  the  distances  from  router  i  to  each 
destination  based  on  the  topology  in  Tl  and  the  distances 
from  each  neighbor  k  to  each  destination  based  on  the  topolo¬ 
gies  in  T'k  for  each  k.  The  distance  of  router  i  to  node  j  in  T 1 
is  denoted  by  ;  the  distance  from  k  to  j  in  Tk  is  denoted 

by  irk. 

4.  The  routing  table  stores,  for  each  destination  j,  the  succes¬ 
sor  set  Sj  and  the  feasible  distance  FD{ .  which  is  used  by 
MPDA  to  enforce  LFI  conditions. 

5.  The  link  table  stores,  for  each  neighbor  k ,  the  cost  Vk  of  the 
adjacent  link  to  the  neighbor. 

The  unit  of  information  exchanged  between  routers  is  a  link- 
state  update  (LSU)  message.  A  router  sends  an  LSU  message  con¬ 
taining  one  or  more  entries,  with  each  entry  specifying  addition, 
deletion  or  change  in  cost  of  a  link  in  the  router’s  main  topology 
table  T' .  Each  entry  of  an  LSU  consists  of  link  information  in  the 
form  of  a  triplet  [ft,  t,  d]  where  h  is  the  head,  t  is  the  tail,  and  d  is  the 
cost  of  the  link  ft  — >  t.  An  LSU  message  contains  an  acknowledg¬ 
ment  (ACK)  flag  for  acknowledging  the  receipt  of  an  LSU  message 
from  a  neighbor  (used  only  by  MPDA). 

The  INIT-PDA  procedure  in  Fig.  1  initializes  the  tables  of  a 
router  at  startup  time;  all  variables  of  type  distance  are  initialized 
to  infinity  and  those  of  type  node  are  initialized  to  null.  All  suc¬ 
cessor  sets  are  initialized  to  the  empty  set.  PDA  is  executed  each 
time  an  event  occurs;  an  event  can  be  either  a  receipt  of  an  LSU 
message  from  a  neighbor  or  the  detection  of  an  adjacent  link-cost 
change.  Procedure  NTU  (Neighbor  Topology  Table  Update)  shown 
in  Fig.  2  is  used  to  process  the  received  message  and  update  the  nec¬ 
essary  tables.  Procedure  MTU  in  Fig.  3  constructs  the  router’s  own 
shortest  path  tree  from  the  topologies  reported  by  its  neighbors. 
The  new  shortest-path  tree  obtained  is  compared  with  the  previous 
version  to  determine  the  differences;  only  the  differences  are  then 
reported  to  the  neighbors.  The  router  then  waits  for  the  next  event 
and,  when  it  occurs,  the  whole  process  is  repeated. 

The  algorithm  MTU  at  router  i  merges  the  topologies  T'k  and 
the  adjacent  links  Vk  to  obtain  T!.  The  merge  process  is  straight¬ 
forward  if  all  neighbor  topologies  contain  disjoint  sets  of  links,  but 
when  two  or  more  neighbors  report  conflicting  information  regard¬ 
ing  a  particular  link,  the  conflict  has  to  be  resolved.  Sequence  num¬ 
bers  may  be  used  to  distinguish  between  old  and  new  link  informa¬ 
tion  as  in  OSPF,  but  PDA  resolves  the  conflict  as  follows.  If  two  or 
more  neighbors  report  information  of  link  (m,  n)  then  the  router  i 
should  update  topology  table  T’  with  link  information  reported  by 


procedure  NTU 
begin 

(1)  if  (LSU  message  is  received  from  a  neighbor  k)  then 

(la)  Update  neighbor  table  T) .  That  is,  add  links, 

delete  links  or  change  links  according  to  the 
specification  of  each  entry  in  the  LSU; 

(lb)  Run  Dijkstra's  shortest  path  algorithm 

on  the  resulting  topology  Tk ;  /*This  results  in 
finding  minimum  distances  from  k  to  all  other 
nodes  in  7 .  Note  7  /  is  a  tree*/ 

(lc)  Update  I)1  ^  with  new  distances  in  7 * ; 
endif 

(2)  if  (adjacent  link  (i,  k)  is  up)  then 

Update  1}  and  send  an  LSU  message  to  the 
neighbor  k  with  link  information  of  all  links  in 
its  main  topology  table  T1 ; 

endif 

(3)  if  (cost  of  an  adjacent  link  (i,  k)  changed)then 

Update  1}: 

endif 

(4)  if  (adjacent  link  (i,  k)  failedjthen 

Update  1}  and  clear  the  table  7  'j ; 

endif 

end  NTU 


Figure  2:  Neighbor  Topology  Table  Update  algorithm 


the  neighbor  that  offers  the  shortest  distance  from  the  router  i  to  the 
head  node  m  of  the  link.  Ties  are  broken  in  favor  of  neighbor  with 
the  lowest  address.  For  adjacent  links,  router  i  itself  is  the  head  of 
the  link  and  thus  has  the  shortest  distance.  Therefore,  any  informa¬ 
tion  about  an  adjacent  link  supplied  by  neighbors  will  be  overridden 
by  the  most  current  information  about  the  link  available  to  router 
i.  Dijkstra’s  shortest  path  algorithm  is  run  on  Tl  and  only  the  links 
that  constitute  the  shortest-path  tree  are  retained.  Note  that,  be¬ 
cause  there  are  potentially  many  shortest-path  trees,  ties  should  be 
broken  consistently  during  the  run  of  Dijkstra’s  algorithm. 

In  what  follows,  we  show  that  PDA  works  correctly  by  showing 
that  the  topology  tables  at  all  nodes  converge  to  the  shortest  paths 
within  a  finite  time  after  the  last  link  cost  change  in  the  network. 
After  convergence,  because  there  are  no  more  changes  to  the  topol¬ 
ogy  tables,  no  more  LSU  messages  are  generated. 

Definitions:  The  n-hop  minimum  distance  of  router  i  to  node  j 
in  a  network  is  the  minimum  distance  possible  using  a  path  of  n 
links  or  less.  A  path  that  offers  the  n-hop  minimum  distance  is 
called  n-hop  minimum  path.  If  there  is  no  path  with  n  hops  or  less 
from  router  i  to  j  then  the  n-hop  minimum  distance  from  i  to  j  is 
undefined.  An  n-hop  minimum  tree  of  a  node  j  is  a  tree  in  which 
router  i  is  the  root  and  all  paths  of  n  hops  or  less  from  the  root  to 
any  other  node  is  an  n-hop  minimum  path.  Note  that  there  could 
be  more  than  one  n-hop  minimum  tree. 

Let  G  denote  the  final  topology  of  the  network  after  all  link 
changes  occurred  as  seen  by  an  omniscient  observer;  we  use  bold 
font  to  refer  to  all  quantities  in  G.  Let  HJ,  denote  an  n-hop  min¬ 
imum  tree  rooted  at  router  i  in  G  and  let  MJ,  be  the  set  of  nodes 
that  are  within  n  hops  from  i  in  HJ,.  Let  D„J  denote  the  distance 
of  i  to  j  in  H„ .  Let  dy  be  the  cost  of  the  link  i  -A  j.  The  notation 
i  j  indicates  a  path  from  i  to  j  of  zero  or  more  links. 

Property  2  From  the  principle  of  optimality  (a  sub-path  of  a  short¬ 
est  path  between  two  nodes  is  also  a  shortest  path  between  the  end 
nodes  of  the  sub-path),  if  FI  and  H'  are  two  n-hop  minimum  trees 
rooted  at  router  i  and  M  and  M'  are  sets  of  nodes  that  are  within 
n  hops  from  i  in  FI  and  H'  respectively,  then  M  =  M'  =  MJ,. 
Also,  for  each  j  E  MJ,  the  length  of  path  i  ^  j  in  both  H  and  H' 
is  equal  to  Dl^.  Also,  <  D|^  ifh  >  n. 


procedure  MTU  at  router  i 

begin 

(1)  oldT1  <—  T’  j*  Save  copy  */ 

(2)  if  (node  j  occurs  in  at  least  one  of  /  V; )  then 

add  j  to  the  main  topology  table  T‘  : 

endif 

(3)  foreach  node  j  in  Tl  do 

M IN  «-  min{D’jk  +  i*|fc  6  «*}; 
let  p  be  such  that  MIN  =  (Djp  +  l'p)- 
/*  Neighbor  p  is  the  preferred  neighbor  for 
destination  j.  Ties  are  broken  in  favor  of 
lower  address  neighbor  */ 
done 

(4)  foreach  j  in  Tl  and  its  preferred  neighbor  p  do 

Copy  all  links  (j,  n)  from  Tp  to  T' ; 

/*  i.e.,  copy  all  links  in  7  for  which 
j  is  the  head  node  */ 

done 

(5)  Update  T’  with  information  of  each  1} ; 

(6)  Run  Dijkstra's  shortest  path  algorithm  on  Tl 
and  remove  those  links  in  T’  that  are  not 
part  of  the  shortest  path  tree; 

(7)  Update  Di  with  new  distances  in  T' ; 

(8)  Compare  oldT 1  with  T!  and  note  all  differences; 
end  MTU 


Figure  3:  Main  Topology  Table  Update  Algorithm 


We  say  a  router  i  knows  at  least  the  n-hop  minimum  tree,  if  the 
tree  represented  by  its  main  topology  table  T'  is  at  least  an  n-hop 
minimum  tree  rooted  at  i  in  G  and  there  are  at  least  n  nodes  in  Tl 
that  are  reachable  from  the  root  i.  Note  that  the  links  in  T ’  that  are 
more  than  n  hops  may  have  costs  that  do  not  agree  with  the  link 
costs  in  G. 

Lemma  l  If  a  router  i  has  the  final  correct  costs  of  the  adjacent 
links  and  for  each  neighbor  k  the  topology  Tk  is  an  n-hop  minimum 
tree,  then  the  topology  T1  is  ( n  +  1  )-hop  minimum  tree  after  the 
execution  of  MTU. 

Proof:  The  proof  is  presented  in  the  Appendix.  □ 

Theorem  2  At  each  router  i,  the  main  topology  T’  gives  the  cor¬ 
rect  shortest  paths  to  all  known  destinations  a  finite  time  after  the 
last  change  in  the  network. 

Proof:  The  proof  is  by  induction  on  t.n,  the  global  time  when 
for  each  router  i,  T’  is  at  least  n-hop  minimum  tree.  Because  the 
longest  loop-free  path  in  the  network  has  at  most  N  —  1  links  where 
N  is  number  of  nodes  in  the  network,  t.N-i  is  the  time  when  every 
router  has  the  shortest  path  to  every  other  node.  We  need  to  show 
that  In-i  is  finite.  The  base  case  is  ft,  the  time  when  every  node 
has  1-hop  minimum  distance  and  because  the  adjacent  link  changes 
are  notified  within  finite  time,  ti  <  oo.  Let  t.n  <  oo  for  some 
n  <  N.  Given  that  the  propagation  delays  are  finite  each  router 
will  have  each  of  its  neighbors  n-hop  minimum  tree  in  finite  time 
after  tn.  From  Theorem  1  we  can  see  that  the  router  will  have 
at  least  the  (n  +  l)-hop  minimum  tree  within  a  finite  time  after 
tn.  Therefore,  t.„+i  <  oo.  From  induction,  we  can  conclude  that 
fjV-1  <  oo.  □ 

4.1.2  Computing  5] 

The  LFI  conditions  introduced  in  Section  3  suggest  a  technique  for 
computing  5]  such  that  the  implied  routing  graph  SGj  is  loop- 
free  at  every  instant.  To  determine  F D’  in  Eq.(16),  router  i  needs 
to  know  Djj,  the  distance  from  i  to  node  j  in  the  topology  table 


procedure  MPDA  at  router  i 

{invoked  when  an  event  occurs} 

begin 

(1)  call  NTU; 

(2)  if  (node  is  in  PASSIVE  state)  then 

(2a)  call  MTU;  I*  update  Tl  and  Dl.  */ 

(2b)  FI)  «-  mini  Fid  .1)  j; 

endif 

(3)  if  (node  is  in  ACTIVE  state  and  the 

last  ACK  is  received)  then 
(3a)  tempi  7—  Di ;  Set  node  to  PASSIVE  state; 
(3b)  call  MTU  to  update  T‘ ; 

(3c)  FI)  min  {tempi ,  D  } 
endif 

14)  si  <  {k\D]k  <  FUj); 

(5)  if  (changes  occur  in  T!)then 

Set  node  to  ACTIVE  state; 

endif 

if  (no  changes  occur  in  T’  and  the  event  is 
the  last  ACK)  then 
Set  node  to  PASSIVE  state; 

endif 

(6)  if  (there  are  changes  to  T’  )  then 

Compose  a  new  LSU  with  the  topology 
changes  expressed  as  add  link, 
delete  link  and  change  link; 

endif 

(7)  if  (input  event  received  is  an  LSU  message)then 

Add  the  ACK  entry  to  newly  composed  LSU ; 

endif 

(8)  Send  the  new  LSU  message, 
end  MPDA 


Figure  4:  Multiple-path  Partial-topology  Dissemination  Algorithm 
(MPDA) 

T/'  .  Because  of  propagation  delays,  there  may  be  discrepancies 
between  the  main  topology  table  T!  at  router  i  and  its  copy  T/' 
at  the  neighbor  k.  Flowever,  at  time  f,  the  topology  table  T-j  is  a 
copy  of  the  main  topology  table  T ’  at  some  earlier  time  t!  <  t. 
Logically,  if  a  copy  of  D}  is  saved  each  time  an  LSU  is  sent,  a 
feasible  distance  FD ]  that  satisfies  the  LFI  conditions  can  be  found 
in  the  history  of  values  of  Dj  that  have  been  saved! 

The  multiple-path  partial-topology  dissemination  algorithm,  or 
MPDA,  shown  in  Fig.  4  is  a  modification  of  PDA  that  enforces  the 
LFI  conditions  by  synchronizing  the  exchange  of  LSUs  between 
neighbors.  In  MPDA,  each  LSU  message  sent  by  a  router  is  ac¬ 
knowledged  by  all  its  neighbors  before  the  router  sends  the  next 
LSU.  The  inter-neighbor  synchronization  used  in  MPDA  spans  only 
a  single  hop,  unlike  the  synchronization  in  diffusing  computations 
[7]  which  potentially  spans  the  whole  network.  A  router  is  said 
to  be  in  ACTIVE  state  when  it  is  waiting  for  its  neighbors  to  ac¬ 
knowledge  the  LSU  message  it  sent;  otherwise,  it  is  in  PASSIVE 
state. 

Assume  that,  initially,  all  routers  are  in  PASSIVE  state  with 
all  routers  having  the  correct  distances  to  all  destinations.  Then  a 
series  of  link  cost  changes  occurs  in  the  network  resulting  in  some 
or  all  routers  to  go  through  a  sequence  of  PASS  I  VE-to- ACTIVE 
and  ACTIVE-to-PASSIVE  state  transitions,  until  all  routers  be¬ 
come  PASSIVE  with  correct  distances  to  destinations. 

If  a  router  in  a  PASSIVE  state  receives  an  event  that  does  not 
change  its  topology  T\  then  the  router  has  nothing  to  report  and 
remains  in  PASSIVE  state.  However,  if  a  router  in  PASSIVE 
state  receives  an  event  that  affects  a  change  in  its  topology,  the 
router  sends  those  changes  to  its  neighbors,  goes  into  ACTIVE 
state  and  waits  for  ACKs.  Events  that  occur  during  the  ACTIVE 
period  are  processed  to  update  T'k  and  Vk  but  not  T’ ;  the  updating 


Passive-to-active  transitions 


Implicit  transition 


UJ 

Topology  changing  Events 


\ _ FJ 

Topology  changing  Events 


Active-to-passive  transitions 


Figure  5:  Active-passive  phase  transitions  in  MPDA. 


procedure  IH 
begin 

(2)  if  (|Sj  |  =  1)  then 

i; 

endif 


(3)  if  (\S’j  |  >  1)  then 


E„ 


■  6  S 


Vj  k 

endif 


(ISJI-i) 


end  IH 


\/k  e  Sj; 


of  T ’  by  MTU  is  deferred  until  the  end  of  the  ACTIVE  phase. 
At  the  end  of  the  ACTIVE  phase,  when  ACKs  from  all  neighbors 
are  received,  router  i  updates  T‘  with  changes  that  may  have  oc¬ 
curred  in  T'k  due  to  events  received  during  the  ACTIVE  phase.  If 
no  changes  occurred  in  Tl  that  need  reporting,  then  the  router  be¬ 
comes  PASSIVE;  otherwise,  as  shown  in  Fig.  5,  there  are  changes 
in  T'  that  may  have  resulted  due  to  events  and  the  neighbors  need 
to  be  notified.  This  results  in  a  new  LSU,  and  the  router  immedi¬ 
ately  becoming  ACTIVE  again.  In  this  case,  there  is  an  implicit 
PASSIVE  period,  of  zero  length  of  time,  between  two  back-to- 
back  ACTIVE  periods,  as  illustrated  in  Fig.  5.  A  router  i  receiving 
an  LSU  message  from  k  must  send  back  an  LSU  with  the  ACK  bit 
set  after  updating  T’k.  If  the  router  does  not  have  any  updates  to 
send,  either  because  it  is  in  ACTIVE  state  or  because  it  does  not 
have  any  changes  to  report,  it  sends  back  an  empty  LSU  with  just 
the  ACK  flag  set.  When  a  router  detects  that  an  adjacent  link  failed, 
any  pending  ACKs  from  the  neighbor  at  the  other  end  of  the  link 
are  treated  as  received.  Because  all  LSUs  are  acknowledged  within 
a  finite  time,  no  deadlocks  can  occur. 

The  following  theorem  proves  that  MPDA  provides  loop-free 
multipaths  at  every  instant. 


Theorem  3  ( Safety  property)  At  any  time  t,  the  directed  graph 
SGj(t)  implied  by  the  successor  sets  Sj(t)  computed  by  MPDA 
at  each  router  is  loop-free. 

Proof:  The  proof  is  presented  in  the  Appendix,  and  is  based  on 
showing  that  FD '■  and  Sj,  as  computed  by  MPDA,  satisfy  the  LFI 
conditions.  □ 

Theorem  4  ( Liveness  property )  A  finite  time  after  the  last  change 
in  the  network,  Dj  gives  the  correct  shortest  distance  and 

Sj  =  { k |  Dj  <  Dj,k  €  N1}  at  each  router  i 

Proof:  The  convergence  of  MPDA  follows  directly  from  the 
convergence  of  PDA,  because  the  update  messages  in  MPDA  are 
only  delayed  a  finite  time  as  allowed  in  line  4  in  algorithm  PDA. 
Therefore,  the  distances  Dj  in  MPDA  also  converge  to  shortest  dis¬ 
tances.  Because  changes  to  Tl  are  always  reported  to  the  neighbors 
and  are  incorporated  by  the  neighbors  in  their  tables  in  finite  time, 
Djk  =  Dj  for  k  €  N’  after  convergence.  From  line  3c  in  MPDA, 
we  observe  that  when  router  i  becomes  PASSIVE,  and  FDj  =  Dj 
holds  true.  Because  all  routers  are  PASSIVE  at  convergence  time 
it  follows  that  the  set  {£j  D'jl;  <  FDj,  k  6  V*}  is  the  same  as  the 
set  {k\Dj  <  D’j,k  £  IV'}.  □ 

4.2  Distributing  Traffic  over  Multiple  Paths 

In  general,  the  function  can  be  any  function  that  satisfies  Prop¬ 
erty  1,  but  our  objective  is  to  obtain  a  function  tp  that  performs 
load  balancing  that  is  as  close  as  possible  to  perfect  load  balancing 
(Eqs .  ( 1 0)-(  12)) . 


Figure  6:  Heuristic  for  initial  load  assignment. 


procedure  AH 
begin 

(1  )D 
(2)  let  D 


■  min{D\  +  V,  \k  e  5}}: 


min  ’  l  jk 

ij  =(DUo+l  to); 


//  That  is,  ko  be  the  neighbor 
that  offers  this  minimum) 

(3)  foreach  k  £  S'.  do 


jk 


done 


£>*.,  +  l\  -  D,J  .  ; 

jk  '  k  min' 


(4)  A  ^min{~-\k  E  Sl-  A  al.k  ^  0} 


(4)  foreach  h  i=-  ko  A  k  E  do 


vjk 


tU  -  A  x 


done 

(5)  for  k  =  ko  do 

4>)k  <P'jk  +  E 

done 

end  AH 


qes‘.  A  x  ajq’ 


Figure  7:  Heuristic  for  incremental  load  adjustment. 


The  function  tp  should  also  be  suitable  for  use  in  dynamic  net¬ 
works,  where  the  flows  over  links  are  continuously  changing,  caus¬ 
ing  continuous  link-cost  changes.  To  respond  to  these  changes, 
queueing  delays  at  the  links  must  be  measured  periodically  and 
routing  paths  must  be  recomputed.  However,  re-computing  paths 
frequently  consumes  excessive  bandwidth  and  may  also  cause  os¬ 
cillations.  Therefore,  routing-path  changes  should  only  be  done 
at  sufficiently  long  intervals.  Unfortunately,  a  network  cannot  be 
responsive  to  short-term  traffic  bursts  if  only  long-term  updates 
are  performed.  For  this  reason,  we  use  link  costs  measured  over 
two  different  intervals;  link  costs  measured  over  short  intervals 
of  length  Ts  are  used  for  routing-parameter  computation  and  link 
costs  measured  over  longer  intervals  of  length  T;  are  used  for  routing- 
path  computation  [17],  In  general,  T)  must  be  several  times  longer 
than  Ts .  Long-term  updates  are  designed  to  handle  long-term  traf¬ 
fic  changes  and  are  used  by  the  routing  protocol  to  update  the  suc¬ 
cessor  sets  at  each  router,  so  that  the  new  routing  paths  are  the  short¬ 
est  paths  under  the  new  traffic  conditions.  The  short-term  updates 
made  every  Ts  seconds  are  designed  to  handle  short-term  traffic 
fluctuations  that  occur  between  long-term  routing  path  updates  and 
are  used  to  compute  the  routing  parameters  <j>'jk  in  Eq.  (15)  lo¬ 
cally  at  each  router.  Accordingly,  our  traffic  distribution  heuristics 
assume  a  constant  successor  set  and  successor  graph. 

When  Sj  is  computed  for  the  first  time  or  recomputed  again  due 
to  long-term  route  changes,  traffic  should  be  freshly  distributed.  In 
this  case,  the  allocation  heuristic  function  Tf  is  a  function  of  only 
the  marginal  distances  through  the  successor  set.  That  is,  Eq.  (15) 


reduces  to  the  form  {4>)k}  =  { D 'J  +  l'p\p  £  TV'}).  When  a 

new  successor  set  ,S”  is  computed,  algorithm  IH  in  Fig.  6  is  first 
used  to  distribute  traffic  over  the  successor  set  [17],  Note  that 
{4>jk}’  computed  in  IH,  satisfy  Property  1.  Furthermore,  when 
more  than  one  successor  is  present,  if  Dfp  +  lp  >  D'/q  +  Vq  for 
successors  p  and  q,  then  d>'Jp  <  <j>'jq .  The  heuristic  makes  sense  be¬ 
cause  the  greater  the  marginal  delay  through  a  particular  neighbor 
becomes,  the  smaller  the  fraction  of  traffic  that  is  forwarded  to  that 
neighbor. 

After  the  first  flow  assignment  is  made  over  a  newly  computed 
successor  set  using  algorithm  IH,  a  different  flow  allocation  heuris¬ 
tic  algorithm  AH  shown  in  Fig.  7  is  used  to  adjust  the  routing  pa¬ 
rameters  every  Ts  seconds  until  the  successor  set  changes  again. 
The  heuristic  function  'T  computed  in  AH  is  incremental  and,  un¬ 
like  IH,  is  a  function  of  current  flow  allocation  on  the  successor 
sets  and  the  marginal  distances  through  the  successors.  AH  also 
preserves  Property  1  at  every  instant.  In  AH  traffic  is  incremen¬ 
tally  moved  from  the  links  with  large  marginal  delays  to  links  with 
the  least  marginal  delay.  The  amount  of  traffic  moved  away  from 
a  link  is  proportional  to  how  large  the  marginal  delay  of  the  link 
is  compared  to  the  best  successor  link.  The  heuristic  tends  to  dis¬ 
tribute  traffic  in  such  a  way  that  Eqs.  ( 10)-(  12)  hold  true.  This 
is  important,  because  the  initial  distribution  obtained  by  IH  is  far 
from  being  balanced.  The  computation  complexity  of  the  heuristic 
allocation  algorithms  is  0(Nt).  Because  the  heuristics  are  run  for 
each  active  destination,  the  whole  load-balancing  activity  is  O(N). 

Unlike  q  in  Gallager's  algorithm,  Tj  and  Ts  are  local  constants 
that  are  set  independently  at  each  router.  Convergence  of  our  al¬ 
gorithm  does  not  critically  depend  on  these  constants  like  optimal 
routing  does  on  q.  Also,  T)  and  Ts  need  not  be  static  constants 
and  can  be  made  to  vary  according  to  congestion  at  the  router.  The 
value  of  Ti,  however,  should  be  such  that  it  is  sufficiently  longer 
than  the  time  it  takes  for  computing  the  shortest  paths.  The  long¬ 
term  update  periods  should  be  phased  randomly  at  each  router,  be¬ 
cause  of  the  problems  that  would  result  due  to  synchronization  of 
updates  [3]. 

4.3  Computing  Link  Costs 

As  mentioned  earlier,  the  cost  of  a  link  is  the  marginal  delay  over 
the  link  D' 

If  the  links  are  assumed  to  behave  like  M/M/1  queues,  then  the 
marginal  delay  D'  (fik)  can  be  obtained  in  a  closed  form  expression 
by  differentiating  the  following  equation  [16]. 

Dik  (fik )  =  ,r  ^  ,  .  +  rik  fik  (24) 

V-'ik  Jik) 

where  /,/,  is  the  flow  through  the  link  ( i,k ),  and  Cjk  and  rzk 
are  the  capacity  and  propagation  delay  of  the  link.  Because  the 
M/M/1  assumption  does  not  hold  in  practice  in  the  presence  of 
very  bursty  traffic,  and  because  Eq.  (24)  becomes  unstable  when 
fik  approaches  C;k,  an  on-line  estimation  of  the  marginal  delays  is 
desirable. 

There  are  several  techniques  for  computing  marginal  delays 
that  are  currently  available  (e.g.,  [23,  22,  6]).  For  the  purposes 
of  simulations,  we  borrow  a  technique  introduced  by  Cassandras, 
Abidi  and  Towsley  [6]  for  on-line  estimation  of  the  marginal  de¬ 
lay  D' (fik).  The  technique  uses  perturbation  analysis  (PA)  for  the 
on-line  estimation  and  is  shown  to  perform  better  than  the  M/M/1 
estimation.  In  addition,  the  PA  estimation  does  not  require  a  priori 
knowledge  of  the  link  capacities.  This  is  very  significant,  because 
the  capacity  available  to  best-effort  traffic  in  real  networks  varies 
according  to  the  capacity  allocated  to  other  types  of  traffic,  such  as 
real-time  traffic.  We  must  emphasize  that  our  approach  does  not 
depend  on  which  specific  technique  is  used  for  marginal-delay  es¬ 
timation,  although  some  methods  may  be  better  than  others.  The 
convergence  or  stability  of  our  routing  algorithm  does  not  depend 
on  the  specific  technique  used  for  marginal-delay  estimation. 


Figure  8:  Topologies  used  in  simulations 


5  Simulations 

The  simulations  discussed  in  this  section  illustrate  the  effectiveness 
of  our  near-optimal  framework,  and  demonstrate  the  significant  im¬ 
provements  achieved  by  our  approach  over  single-path  routing  in 
static  and  dynamic  environments.  The  delays  obtained  by  opti¬ 
mal  routing,  single-path  routing  and  our  approximation  scheme  are 
compared  under  identical  topological  and  traffic  environments.  The 
results  show  that  the  average  delays  achieved  via  our  approxima¬ 
tion  scheme  are  comparable  (within  a  small  percentage  difference 
rather  than  several  times  difference)  to  the  optimal  routing  under 
quasi-static  environment  and  the  same  are  significantly  better  than 
single-path  routing  in  a  dynamic  environment. 

For  optimal  routing,  we  implemented  the  algorithm  described 
by  Gallager  [8],  and  label  it  with  ’OPT".  The  plots  of  our  approx¬ 
imation  scheme  are  labeled  with  ’MP\  To  obtain  representative 
delays  for  single-path  routing  algorithms,  we  opted  to  restrict  our 
multipath  routing  algorithm  to  use  only  the  best  successor  for  packet 
forwarding,  instead  of  simulating  any  specific  shortest-path  algo¬ 
rithm.  Because  of  the  instantaneous  loop-freedom  property  that 
MPDA  exhibits,  the  shortest-path  delays  obtained  this  way  are  bet¬ 
ter  than  or  similar  to  the  delays  obtained  with  either  EIGRP  [1], 
which  is  based  on  DUAL  and  requires  much  more  internodal  syn¬ 
chronization  than  our  scheme,  rendering  longer  delays,  and  RIP  [14] 
or  OSPF  [20],  which  do  not  prevent  temporary  loops.  We  use  the 
label  ’SP’  for  single-path  routing  in  the  graphs. 

We  performed  simulations  on  the  topologies  shown  in  Fig.  8. 
CAIRN  (www.caim.net)  is  a  real  network  and  NET1  is  a  contrived 
network.  We  are  only  interested  in  the  connectivity  of  CAIRN, 
and  its  topology  as  used  differs  from  the  real  network  in  the  ca¬ 
pacities  and  propagation  delays  assumed  in  the  simulation  experi¬ 
ments.  We  restricted  the  link  capacities  to  a  maximum  of  lOMbs, 
so  that  it  becomes  easy  to  sufficiently  load  the  networks.  NET1 
has  a  connectivity  that  is  high  enough  to  ensure  the  existence  of 
multiple  paths,  and  small  enough  to  prevent  a  large  number  of  one- 
hop  paths.  The  diameter  of  NET1  is  four  and  the  nodes  have  de¬ 
grees  between  3  and  5.  In  each  network  we  setup  flows  between 
several  source-destination  pairs  and  measure  the  average  delays  of 
each  flow.  The  flows  in  CAIRN  are  setup  between  these  source- 
destination  pairs:  (lbl,  mci-r),(netstar,  isie),  (isi,  darpa),  (pare,  sdsc), 
(sri,  mit)  ,(tioc,  sdsc),(mit,  sri),(isie,  netstar),  (sdsc,  parc),(mci-r, 
tioc),(darpa,  isi).  For  NET  1,  the  source-destination  pairs  are:  (9,2), 
(8,3),  (7,0),  (6,1),  (5,8),  (4,1),  (3,8),  (2,9),  (1,6),  (0,7). 

The  flows  have  bandwidths  in  the  range  0.2- 1.0  Mbs.  For  sim¬ 
plicity,  we  used  a  stable  topology  (links  or  nodes  do  not  fail)  in 
all  the  simulations.  In  the  presence  of  link  failures,  MP  can  only 
perform  better  than  SP,  because  of  availability  of  alternate  paths. 
Furthermore,  OPT  is  not  fast  enough  to  respond  to  drastic  topology 
changes.  Because  MP  is  parameterized  by  the  Ti  and  Ts  update  in¬ 
tervals,  its  delay  plots  are  represented  by  MP-TL-xt-TS-yy,  where 
xx  is  the  Tj  update  interval  and  yy  is  the  Ts  update  interval  mea¬ 
sured  in  seconds.  Similarly,  the  delays  of  shortest-path  routing  are 
represented  by  SP-TL-xr,  where  xx  is  the  T  update  period. 


Comparison  of  MP  and  OPT  delays 
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Figure  9:  Delays  of  OPT  and  MP  in  CAIRN. 


Figure  1 1 :  Delays  of  MP  and  SP  in  CAIRN. 
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Figure  10:  Delays  of  OPT  and  MP  in  NET1. 


5.1  Performance  under  Stationary  Traffic 

Fig.  9  shows  the  average  delays  of  flows  in  CAIRN  for  OPT  and 
MP  routing.  The  flow  IDs  are  plotted  on  the  x-axis  and  average  de¬ 
lays  of  the  flows  are  plotted  on  the  y-axis.  Plot  OPT-25  represents 
the  25%  ’envelope’,  that  is,  the  delays  of  OPT  are  increased  by 
25%  to  obtain  the  OPT-25  plot.  As  can  be  seen,  the  average  delays 
of  flows  under  MP  routing  are  within  the  OPT-25  envelope.  Sim¬ 
ilarly,  in  Fig.  10,  the  delays  obtained  using  MP  routing  for  NET1 
are  within  28%  envelopes  of  delays  obtained  using  OPT  routing. 
We  say  delays  of  MP  are  ’comparable"  to  OPT  if  the  delays  of  MP 
are  within  a  small  percent  of  those  of  OPT. 

Fig.  1 1  compares  the  average  delays  of  MP  and  SP  for  CAIRN. 
We  observe  that  the  delays  of  SP  for  some  flows  are  two  to  four 
times  those  of  MP.  In  Fig.  12,  for  NET1,  MP  routing  performs 
even  better;  average  delays  of  SP  are  as  much  as  five  to  six  times 
those  of  MP  routing  which  is  due  to  higher  connectivity  available 
in  NET  1 .  Also  observe  that,  because  of  load-balancing  used  in  MP, 
the  plots  of  MP  are  less  jagged  than  those  of  SP.  MP  routing  per¬ 
forms  much  better  than  SP  under  high-connectivity  and  high-load 
environments.  When  connectivity  is  low  or  network  load  is  light, 
MP  routing  cannot  offer  any  advantage  over  SP. 


Figure  12:  Delays  of  MP  and  SP  in  NET1. 


5.2  Effect  of  Tuning  Parameters  T;  and  Ts 

The  performance  of  MP  depends  on  the  update  intervals  T;  and 
Ts.  The  setting  of  7}  and  Ts ,  however,  is  simple.  They  are  local 
and  can  be  set  independently  at  each  node  without  affecting  con¬ 
vergence,  unlike  the  global  constant  r]  which  is  critical  for  conver¬ 
gence  of  OPT.  For  CAIRN,  Fig.  13  show  the  effect  of  increasing 
Ti  when  Ts  and  the  input  traffic  is  fixed.  Observe  that  when  T)  is 
increased  from  10  to  20  seconds,  the  delays  in  SP  have  more  than 
doubled,  while  the  delays  of  MP  remain  relatively  unchanged.  This 
effect  indicates  that  T;  can  be  made  longer  in  MP  without  signifi¬ 
cantly  effecting  performance.  This  is  significant,  because  sending 
frequent  update  messages  consume  bandwidth  and  can  also  cause 
oscillations  under  high  loads.  Similarly,  for  NET1,  delays  for  SP 
increased  significantly  while  there  is  negligible  change  in  delays  of 
MP  as  can  be  observed  in  Fig.  14,  respectively.  Our  new  rout¬ 
ing  framework  provides  the  means  for  a  trade-off  between  update 
messages  and  local  load-balancing. 

At  Ts  intervals,  the  load-balancing  heuristics  are  executed,  which 
are  strictly  local  computations  and  require  no  communication.  There¬ 
fore,  Ts  can  be  set  according  to  the  processing  power  available  at 
the  router.  Tj  can  be  made  from  a  few  times  to  orders  of  mag¬ 
nitude  greater  than  Ts .  In  the  simplest  case,  Ts  can  be  set  to  the 
same  value  of  T;  and  still  gain  significant  performance  as  shown 
in  Figs.  11  and  12.  In  the  figures,  we  observe  that  MP-TL-10-TS- 
10  is  much  closer  to  OPT  than  SP-TL-10.  Just  the  long-term  routes 
with  load-balancing,  without  short-term  routing  parameter  updates. 
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Figure  13:  Delays  when  Ts  is  kept  constant  and  7)  is  increased  in  Figure  15:  Step  response  in  NET1  using  OPT  and  MP  routing. 
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Figure  14:  Delays  when  Ts  is  kept  constant  and  T;  is  increased  in 
NET1. 


seem  to  give  significant  gains;  the  major  gains  here  are  due  to  the 
mere  presence  of  multiple  successors  and  load-balancing.  Our  ex¬ 
perience  from  simulations  indicates  that  a  Tj  that  is  only  a  few 
times  of  longer  than  Ts  suffices  to  gain  significant  benefits.  This 
is  great  news,  because  it  means  that  fine  tuning  of  T)  and  Ts  is  not 
important  for  our  approach  to  be  efficient. 

5.3  Performance  under  Dynamic  Traffic 

It  was  stated  earlier  that  OPT  has  very  poor  response  to  traffic  fluc¬ 
tuations.  This  becomes  evident  in  Fig.  15,  which  shows  a  typical 
response  in  NET1  when  the  flow  rate  is  a  step  function  (i.e..,  the 
flow  rate  is  increased  from  0  to  a  finite  amount  at  time  0).  The 
dampened  response  of  the  network  using  MP  indicates  the  fast  re¬ 
sponsiveness  of  MP,  making  it  suitable  for  dynamic  environments. 
Because  OPT  cannot  respond  fast  enough  to  traffic  fluctuations,  it  is 
impossible  to  find  the  optimal  delays  for  dynamic  traffic.  However, 
we  can  find  a  reasonable  lower  bound  if  the  input  traffic  pattern 
is  predictable  like  the  pattern  shown  in  Fig  16,  which  shows  only 
one  cycle  of  the  input  pattern.  To  obtain  a  lower  bound  for  this 
traffic  pattern  that  represents  ’ideal'  OPT  (the  one  that  has  instan¬ 
taneous  response)  we  first  obtain  the  lower  bound  for  each  interval 
during  which  traffic  is  steady  by  running  a  separate  off-line  simula¬ 
tion  with  traffic  rate  that  corresponds  to  that  interval,  and  combine 
the  results  to  obtain  the  lower  bound.  It  is  with  this  lower  bound 


Figure  16:  Variable  input  traffic  pattern 


that  we  compare  delays  of  MP.  Fig.  17  shows  the  average  delays 
of  the  flows  for  OPT,  MP  and  SP  routing.  The  results  indicate  that 
delays  of  MP  routing  are  again  in  the  comparable  range  of  delays 
of  an  ’ideal’  optimal-routing  algorithm. 

Ultimately,  MP  will  be  used  in  real  networks  where  traffic  is 
bursty  at  any  time-scale;  therefore,  it  is  important  to  see  how  MP 
performs  in  that  environment.  We  extracted  10  flows  from  the  In¬ 
ternet  traffic  traces  obtained  from  LBL  [21]  and  used  them  as  input 
for  the  10  flows  in  the  CAIRN.  Fig.  18  shows  the  delays  for  SP  and 
MP.  We  do  not  perform  this  simulation  with  OPT  because  Internet 
traffic  is  too  bursty  for  OPT  to  converge.  Observe  that,  except  for 
flows  4.  6  and  8,  delays  of  MP  are  much  better  than  those  of  SP. 
The  reason  SP  delays  of  these  flows  are  better  than  those  of  MP 
is  because  of  uneven  distribution  of  load  in  the  network  and  low 
loads  in  some  sections  of  the  network  —  in  low-load  environments 
SP  can  perform  slightly  better  than  MP.  This  can  be  easily  rectified 
by  modifying  IH  to  use  a  small  threshold  cost  for  the  best  link,  the 
crossing  of  which  actually  triggers  the  load-balancing  scheme. 

6  Conclusions 

We  have  presented  a  practical  approach  to  near-optimal  delay  rout¬ 
ing  in  computer  networks.  To  overcome  the  limitations  of  opti¬ 
mal  routing  algorithms,  we  proposed  an  approximation  scheme  and 
suggested  algorithms  that  implement  various  components  of  the  ap¬ 
proximation.  The  resulting  framework  is  both  implementable  in 
real  networks  and  also  provides  delays  that  are  close  to  those  ob- 
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Figure  17:  Delays  under  variable  traffic  in  CAIRN. 


Figure  18:  Delays  under  Internet  traffic  in  CAIRN. 


tainable  using  the  Gallager’s  method.  An  important  element  of  our 
framework  is  our  generalization  of  sufficient  conditions  for  loop- 
free  routing,  which  are  applicable  to  any  type  of  routing  algorithm. 

We  presented  one  of  many  possible  implementations  of  the  new 
routing  framework.  In  doing  so,  we  introduced  the  first  link- state 
routing  algorithm  that  provides  multiple  paths  that  are  loop-free 
at  every  instant  and  that  need  not  be  of  equal  cost.  We  have  shown 
through  simulations  that  our  implementation  of  the  proposed  frame¬ 
work  performs  significantly  better  than  single-path  routing,  and 
that  it  offers  delays  that  are  within  a  small  percentage  of  the  lower 
bound  delays  under  stationary  traffic.  The  simulations  are  by  no 
means  exhaustive,  but  the  results  clearly  indicate  that  the  frame¬ 
work  does  offer  potential  for  obtaining  delays  that  compare  with 
the  optimal  routing. 

Additional  work  is  needed  to  study  flow  allocation  heuristics 
that  are  better  suited  for  specific  end-to-end  services,  e.g.,  trying 
to  avoid  out-of  order  packets  for  certain  flows.  Furthermore,  our 
new  routing  framework  opens  up  many  interested  research  oppor¬ 
tunities  for  quality-of- service  (QoS)  routing,  because  the  loop-free 
invariant  conditions  on  which  it  is  based  can  be  further  constrained 
to  satisfy  different  types  of  service.  Similarly,  because  the  traffic 
allocation  heuristics  depend  on  local  rather  than  global  parameters 
and,  new  heuristics  can  be  defined  to  account  for  QoS  constraints. 


[1]  R.  Albrightson,  J.J.  Garcia-Luna-Aceves,  and  J.  Boyle.  EIGRP-A  Fast 
Routing  Protocol  Based  on  Distance  Vectors.  Proc.  Networld/Interop 
94,  May  1994. 

[2]  D.  Bersekas  and  R.  Gallager.  Second  Derivative  Algorithm  for  Mini¬ 
mum  Delay  Distributed  Routing  in  Networks.  IEEE  Trans.  Commun. , 
32:911-919,  1984. 

[3]  D.  Bertsekas.  Dynamic  Behavior  of  Shortest-Path  Algorithms  for  Com¬ 
munication  Networks.  IEEE  Trans.  Automatic  Control,  27:60-74,  1982. 

[4]  J.B.  Cain,  S.L.  Adams,  M.D.  Noakes,  Tom  Kryst,  and  E.L.  Althouse.  A 
Near-Optimum  Multiple  Path  Routing  Algorithm  for  Space-Based  SDI 
Networks.  MILCOMM,  pages  29.3.1-29.3.7,  1987. 

[5]  D.G.  Cantor  and  M.  Gerla.  Optimal  Routing  in  a  Packet-Switched  Com¬ 
puter  Network.  IEEE  Trans.  Computers,  23:1062-1069,  October  1974. 

[6]  C.G.  Cassandras,  M. V.  Abidi,  and  D.  Towsley.  Distributed  Routing  with 
Onn-Line  Marginal  Delay  Estimation.  IEEE  Trans.  Commun.,  18:348- 
359,  March  1990. 

[7]  E.W.Dijkstra  and  C.S.Scholten.  Termination  Detection  for  Diffusing 
Computations.  Information  Processing  Letters,  11:1-4,  August  1980. 

[8]  R.  G.  Gallager.  A  Minimum  Delay  Routing  Algorithm  Using  Dis¬ 
tributed  Computation.  IEEE  Trans.  Commun.,  25:73-84,  January  1977. 

[9]  J.J.  Garcia-Luna-Aceves.  Loop-Free  Routing  Using  Diffusing  Compu¬ 
tations.  IEEE/ACM  Trans.  Networking,  1:130-141,  February  1993. 

[10]  J.J.  Garcia-Luna-Aceves  and  J.  Behrens.  Distributed,  scalable  routing 
based  on  vectors  of  link  states.  IEEE  Journal  on  Selected  Areas  in 
Communications,  October  1995. 

[11]  J.J.  Garcia-Luna-Aceves  and  S.  Murthy.  A  path-finding  algorithm  for 
loop-free  routing.  IEEE/ACM  Trans.  Networking,  February  1997. 

[12]  J.J.  Garica-Luna-Aceves  and  M.  Spohn.  Scalable  link-state  internet 
routing.  Proc.  International  Conference  on  Network  Protocols,  October 
1998. 

[13]  D.W.  Glazer  and  C.  Tropper.  A  new  metric  for  dynamic  routing  algo¬ 
rithms.  IEEE  Trans.  Commun.,  38:360-367,  March  1990. 

[14]  C.  Hendrick.  Routing  Information  Protocol.  RFC,  1058,  june  1988. 

[15]  J.  M.  Jaffe  and  F.  H.  Moss.  A  Responsive  Distributed  Routing  Algo¬ 
rithm  for  Computer  Networks.  IEEE  Trans.  Commun.,  30:1758-1762, 
July  1982. 

[16]  L.  Klienrock.  Communication  Nets:  Stochastic  Message  Flow  and  De¬ 
lay.  McGraw-Hill,  New  York,  1964. 

[17]  D.  Kourkouzelis.  Multipath  Routing  Using  Diffusing  Computations, 
M.S.  Thesis.  University  of  California,  Santa  Cruz,  March  1997. 

[18]  J.  M.  McQuillan,  I.  Richer,  and  E.  Rosen.  The  new  routing  algorithm 
for  the  arpanet.  IEEE  Trans.  Commun.,  28:711-719,  May  1980. 

[19]  P.  M.  Merlin  and  A.  Segall.  A  Failsafe  Distributed  Routing  Protocol. 
IEEE  Trans.  Commun.,  27:1280-1287,  September  1979. 

[20]  J.  Moy.  OSPF  Version  2.  RFC,  1247,  August  1991. 

[21]  V.  Paxson,  P.  Danzig,  J.  Mogul,  and  M.  Schwartz.  Web  page: 
ita.ee.lbl.gov/html/traces.html.  Lawrence  Berkeley  National  Labora¬ 
tory,  July  1997. 

[22]  M.I.  Reiman  and  A.  Weiss.  Sensitivity  analysis  for  simulations  via  like¬ 
lihood  rations.  Proc.  1986  Winter  Simulation  Conf,  pages  285-289, 
1986. 

[23]  A.  Segall.  The  Modeling  of  Adaptive  Routing  in  Data  Communication 
Networks.  IEEE  Trans.  Commun.,  25:85-95,  January  1977. 

[24]  A.  Segall  and  M.  Sidi.  A  Failsafe  Distributed  Protocol  for  Minimum 
Delay  Routing.  IEEE  Trans.  Commun.,  29:689-695,  May  1981. 

[25]  J.  Spinelli  and  R.  Gallager.  Event  Driven  Topology  Broadcast  without 
Sequence  Numbers.  IEEE  Trans.  Commun.,  37:468^474,  1989. 

[26]  Z.  Wang  and  J.  Crowcroft.  Shortest  Path  First  with  Emergency  Exits. 
Proc.  of  ACM  SIGCOMM,  pages  166-176,  1990. 

[27]  W.  T.  Zaumen  and  J.J.  Garcia-Luna-Aceves.  Loop-Free  Multipath  Rout¬ 
ing  Using  Generalized  Diffusing  Computations.  Proc.  IEEE  INFO- 
COM,  March  1998. 


Appendix 

Proof  of  Lemma  1:  Let  A ’  =  (J  ^  A'k  where  A],  is  the 

set  of  nodes  in  Tk.  Since  Tk  is  at  least  a  (n  —  l)-hop  minimum 
tree  and  node  i  can  appear  at  most  once  in  each  of  A\,  each  A’k 
has  at  least  n  —  1  unique  elements.  Therefore  A 1  has  at  least  n  —  1 
elements.  . 

Let  be  the  set  of  n  —  1  nearest  elements  to  node  i  in  A1. 
That  is  Af|  C  A1  and  |  M'n  |  =  n  —  1  and  for  each  j  G  M, *  and  v  E 
A’  -  \li, .  min ;  I)  .  +  l[\k  E  A  '}  <  min{D’vk  +  Vk\k  G  .V  }. 

The  theorem  is  proved  in  the  following  two  parts: 

1.  Let  G) j  represent  the  graph  constructed  by  MTU  on  line  4 
and  5.  (i.e.,  before  applying  Dijkstra  on  line  6).  For  each 
j  G  Af  *  there  is  a  path  i  j  in  G’n  such  that  its  length  is  at 
most  D„J. 

2.  After  running  Dijkstra  on  G'n  on  line  6  in  MTU,  the  resulting 
tree  is  at  least  an  n-hop  minimum  tree. 

Let  us  first  assume  Part  1  is  true  and  prove  Part  2,  and  then 
proceed  to  prove  Part  1.  From  the  statement  in  Part  1,  for  each 
node  j  G  AT*  there  is  a  path  i  j  in  G'n  with  length  at  most  D},0 . 
After  running  Dijkstra’s  algorithm,  in  the  resulting  graph,  we  can 
infer  that  there  is  a  path  i  j  with  length  at  most  Dj,J .  Because 
there  are  n  —  1  nodes  in  Ml,  the  tree  constructed  has  at  least  n 
nodes  with  node  i  included.  Accordingly,  it  follows  from  Property 
1  that  the  tree  constructed  is  at  least  an  n-hop  minimum  tree. 

Now  we  prove  Part  1.  Order  the  nodes  in  M’n  in  non-decreasing 
order.  The  proof  is  by  induction  on  the  sequence  of  elements  in  Af  f 
as  they  are  added  to  Gln .  The  base  case  is  when  G’n  contains  just 
one  link  l’mi  =  min{Vk\k  G  N’}  and  mi  is  the  first  element  of 
Ml  and  Tmi  =  D^’1"1 .  Let  the  statement  hold  for  the  first  m  —  1 
elements  of  M'n  and  consider  the  m-th  element  j  G  M'n .  Let  K  be 
the  highest  priority  neighbor  for  which  DjK  +  1‘k  —  min{Djk  + 
Vk\ k  G  N’}.  At  Most  m  —  2  nodes  in  Tk  can  have  a  smaller  or 
equal  distance  than  j,  which  implies  path  K  j  exists  with  at 
most  m  —  1  hops.  Let  v  be  the  neighbor  of  j  in  Tk-  Then  the  path 
K  v  — $■  j  has  at  most  m  —  1  hops.  Because  Tk  is  at  least  a 
( n  —  l)-hop  minimum  tree,  the  cost  of  link  v  — >■  j  must  agree  with 
G.  Since  DlvK  +  I'k  <  T>‘jK  +  l'K  ,  from  our  inductive  hypothesis 

,  there  is  a  path  i  v  in  G'n  such  that  the  length  is  at  most  D„v . 

Now  we  need  to  show  that  the  preferred  neighbor  for  v  is  also 
K,  so  that  the  link  v  j  will  be  included  in  the  construction 
of  G’n,  thus  ensuring  the  existence  of  the  path  i  j  in  G'n.  If 
some  other  neighbor  K'  instead  of  K  is  the  preferred  neighbor  for 
v,  then  one  of  the  following  two  cases  should  have  occurred:  (a) 
DIk'  +  Ik'  <  DtK  +  VK  or,  (b)  D[k,  +  Ik'  =  n  +  Ik  and 
priority  of  K'  is  greater  than  priority  of  K. 

Case  (a):  If  D\tK,  +  VK'  <  D\, K  +  l‘K.  then  given  that  D’jK  + 
I’k  <D) 

K'  +  ik'  it  follows  that  the  path  v  j  in  Tk'  is  greater 
than  cost  v  -A-  j  in  G  which  implies  that  Tk'  is  not  a  (n  —  1) 
hop  minimum  tree  -  a  contradiction  to  our  assumption!  Therefore, 
DiK+lk=min{Dik  +  li\keNi}. 

Case  (b):  Let  Qj  be  the  set  of  neighbors  that  give  the  minimum 
distance  to  j,  i.e.,  for  each  k  G  Qj,  Djk  4-  Vk  =  min{D'jk  + 
l'k  |  k  G  N*}.  Similarly,  let  Qv  be  such  that  for  each  k  G  Qv, 
D\,k  +  l\  =  min{D’vk  +  Vk\ k  G  IV!}.  If  k  G  Qv  and  k  $  Qj , 
then  it  follows  from  the  same  argument  used  in  case  (a)  that  v  j 
in  Tk  is  greater  than  v  — >■  j  in  G,  which  implies  that  Tk  is  not 
a  ( n  —  l)-hop  minimum  tree  -  a  contradiction  to  our  assumption 
again.  Therefore,  Qv  C  Qj.  Also,  from  the  same  argument  used 
in  case  (a)  above  it  can  be  inferred  that  K  G  Q„.  Because  K  has 
the  highest  priority  among  all  members  of  Qj  and  Qv  C  Qj,  and 
because  k  G  Qv,  K  must  also  have  the  highest  priority  among  all 


members  of  Qv .  This  proves  that  v  — >  j  will  be  included  in  the 
construction  of  G'n.  Because  D[,v  +  dvj  =  D„J  in  G,  where  dvj 
is  the  final  cost  of  link  v  — >  j,  and  the  length  of  i  v  in  G’n  is 
less  than  Dliv  from  our  inductive  hypothesis,  we  obtained  that  the 
length  of  i  ^  j  in  G'n  less  than  D„J.  This  proves  Part  1  of  the 
theorem.  □ 

Proof  of  Theorem  3:  Let  t„  be  the  time  when  FDj  is  updated 
for  the  n-th  time.  The  proof  is  by  induction  on  the  time  intervals 
As  inductive  hypothesis  assume  that 

FD}(t)  <  Dji(t)  k  (  A  .  /  <  /  ,.  (25) 

We  show  that 

FDj  (f )  <  Dji(t)  f  G  [tn,  tn+i]  (26) 

We  observe  from  the  description  of  MPDA  in  Fig.  4  that,  when 
FDj  is  updated  at  lines  2b  and  3c,  D’  is  also  updated  at  lines  2a 
and  3b  respectively.  We  also  observed  that  FDj  is  updated  only 
during  state  transitions,  and  regardless  of  whether  the  transition  is 
from  PASSIVE-to- ACTIVE  or  from  ACTIVE-to-PASSIVE.the 
Eq.  (27)  below  is  true.  Note  that  there  is  an  implicit  PASSIVE 


state  between  two  back-to-back  ACTIVE  states. 

FD)(tn)  <  min{Dj  (f„_i ),  Dj  (f„)}  (27) 

Let  t'  be  the  time  when  LSU  sent  by  i  at  f  „  is  received  and  pro¬ 
cessed  by  neighbor  k.  Because  of  the  non-zero  propagation  delay 
across  any  link,  t'  is  such  that  t.n  <  t'  <  tn+\.  We  then  have 

Dk  (t')  =  Dj  (tn)  (28) 

Because  FDj  is  modified  at  tn  and  then  remains  unchanged 
within  (tn,t„+i),  we  obtain  from  Eq.  (25)  that 

FD){t)  <  Dfrt)  t€[tn,t')  (29) 

From  Eqs.  (27)  and  (28)  we  obtain  the  following. 

FD}(t)  <  /)';';(/)  te[t',t  n+1  )  (30) 

From  Eq.  (29)  and  (30)  we  have 

FDj  (f )  <  Djj(t)  I  G  7 i )  (31) 

At  tn. i-i ,  again  from  the  design  of  MPDA  we  have, 

FD)(tn+ 1)  <  min{Dj(tn),Dj(tn+i)}  (32) 

Also,  because  propagation  delays  are  positive,  node  k  at  tn+i 
cannot  yet  have  the  value  Dj  (tn+i)-  So,  we  have 

Djj  (fn+i )  =  D){tn)  (33) 

Combining  Eq.  (33)  and  (32)  for  time  tn+i,  we  get 

FD’(tn+1)  <  Djj(tn+i)  (34) 

and  Eq.  (26)  follows  from  combining  Eqs.  (31)  and  (34). 


Because  F Dj  (Li)  <  Dj  -(tii)  at  initialization,  from  induction 
we  have  that  FDj{t)  <  D*) (t)  for  all  t.  Given  that  the  successor 
sets  are  computed  based  on  F  Dj ,  it  follows  that  the  LFI  conditions 
are  always  satisfied.  According  to  Theorem  1,  this  implies  that  the 
successor  graph  SGj  is  always  loop-free.  □ 


